66 research outputs found
Recommended from our members
Inertial-aided Visual Perception of Geometry and Semantics
We describe components of a visual perception system to understand the geometry and semantics of the three-dimensional scene by utilizing monocular cameras and inertial measurement units (IMUs). The use of the two sensor modalities is motivated by the wide availability of the camera-IMU sensor packages present in mobile devices from phones to cars, and their complementary sensing capabilities: IMUs can track the motion of the sensor platform over a short period of time accurately, and provide a scaled and gravity-aligned global reference frame, while cameras can capture rich photometric signatures of the scene, and provide relative motion constraints between images up to scale. We first show that visual 3D reconstruction can be improved by leveraging the global orientation frame -- easily inferred from inertials. In the gravity-aligned global orientation frame, a shape prior can be imposed in depth prediction from a single image, where the normal vectors to surfaces of objects of certain classes tend to align with gravity or orthogonal to it. Adding such a prior to baseline methods for monocular depth prediction yields improvements beyond the state-of-the-art and illustrates the power of utilizing inertials in 3D reconstruction. The global reference provided by inertials is not only gravity-aligned but also scaled, which is exploited in depth completion: We describe a method to infer dense metric depth from camera motion and sparse depth as estimated using a visual-inertial odometry system. Unlike other scenarios using point clouds from lidar or structured light sensors, we have few hundreds to few thousand points, insufficient to inform the topology of the scene. Our method first constructs a piecewise planar scaffolding of the scene, and then uses it to infer dense depth using the image along with the sparse points. We use a predictive cross-modal criterion, akin to “self-supervision,” measuring photometric consistency across time, forward-backward pose consistency, and geometric compatibility with the sparse point cloud. We also launch the first visual-inertial + depth dataset (dubbed ``VOID''), which we hope will foster additional exploration into combining the complementary strengths of visual and inertial sensors. To compare our method to prior work, we adopt the unsupervised KITTI depth completion benchmark, and show state-of-the-art performance on it.In addition to dense geometry, the camera-IMU sensor package can also be used to recover the semantics of the scene. We present two methods to augment a point cloud map with class-labeled objects represented in the form of either scaled and oriented bounding boxes or CAD models. The tradeoff of the two shape representation resides in their generality and capability to model detailed structures. While being more generic, 3D bounding boxes fail to model the details of the objects, whereas CAD models preserve the finest shape details but require more computation and are limited to previously seen objects. Nevertheless, both methods populate an unknown environment with 3D objects placed in a Euclidean reference frame inferred causally and on-line using monocular video along with inertial sensors. Besides, both methods include bottom-up and top-down components, whereby deep networks trained for detection provide likelihood scores for object hypotheses provided by a nonlinear filter, whose state serves as memory. We test our methods on KITTI and SceneNN datasets, and also introduce the VISMA dataset, which contains ground truth pose, point-cloud map, and object models, along with time-stamped inertial measurements.To reduce the drift of the visual-inertial SLAM system -- a building block of all the visual perception systems we have built, we introduce an efficient loop closure detection approach based on the idea of hierarchical pooling of image descriptors. We also open-sourced a full-fledged SLAM system equipped with mapping and loop closure capabilities. The code is publicly available at https://github.com/ucla-vision/xivo
Theoretical Study on Relaxed Surrounding Rock Pressure on Shallow Bias Neighborhood Tunnels under Seismic Load
To study the distribution of relaxed surrounding rock pressure on the shallow bias neighborhood tunnels under the combined action of horizontal and vertical earthquake force, finite element software was used for failure mode analysis. Moreover, with the pseudo-static method, the calculation formula for the relaxed pressure on the shallow bias neighborhood tunnels was derived and used to analyze the variation of the rupture angle of these tunnels under the action of the seismic force. The study shows that: shallow bias neighborhood tunnels basically follow a “W” failure pattern under the combined action of horizontal and vertical seismic force, and the failure scope of the surrounding rock is controlled by four rupture angles. Rupture angles β2 and β3 between the deep and shallow tunnels of the shallow bias neighborhood tunnels are not affected by the surface slope. For tunnels with the same grade of the surrounding rock, the greater the seismic intensity, the smaller the value of β2, and the greater the value of β3. While at the same seismic intensity, the higher the grade of the surrounding rock, the smaller the β2 and β3. Ruptures angles β1 and β4 are influenced by the surface slope, seismic intensity and surrounding rock grades. A steeper surface slope leads to a smaller β1 and a greater β4; β1 increase and β4 decrease with increasing seismic intensity; while, β1 and β4 both show a decreasing trend with an increasing rock grade
Towards Visual Foundational Models of Physical Scenes
We describe a first step towards learning general-purpose visual
representations of physical scenes using only image prediction as a training
criterion. To do so, we first define "physical scene" and show that, even
though different agents may maintain different representations of the same
scene, the underlying physical scene that can be inferred is unique. Then, we
show that NeRFs cannot represent the physical scene, as they lack extrapolation
mechanisms. Those, however, could be provided by Diffusion Models, at least in
theory. To test this hypothesis empirically, NeRFs can be combined with
Diffusion Models, a process we refer to as NeRF Diffusion, used as unsupervised
representations of the physical scene. Our analysis is limited to visual data,
without external grounding mechanisms that can be provided by independent
sensory modalities.Comment: TLDR: Physical scenes are equivalence classes of sufficient
statistics, and can be inferred uniquely by any agent measuring the same
finite data; We formalize and implement an approach to representation
learning that overturns "naive realism" in favor of an analytical approach of
Russell and Koenderink. NeRFs cannot capture the physical scenes, but
combined with Diffusion Models they ca
Review of Recently Progress on Neural Electronics and Memcomputing Applications in Intrinsic SiOx-Based Resistive Switching Memory
In this chapter, we focus on the recent process on memcomputing (memristor + computing) in intrinsic SiOx-based resistive switching memory (ReRAM or called memristor). In the first section of the chapter, we investigate neuromorphic computing by mimicking the synaptic behaviors in integrating one-diode and one-resistive switching element (1D-1R) architecture. The power consumption can be minimized further in synaptic functions because sneak-path current has been suppressed and the capability for spike-induced synaptic behaviors has been demonstrated, representing critical milestones and achievements for the application of conventional SiOx-based materials in future advanced neuromorphic computing. In the next section of chapter, we will discuss an implementation technique of implication operations for logic-in-memory computation by using a SiOx-based memristor. The implication function and its truth table have been implemented with the unipolar or nonpolar operation scheme. Furthermore, a circuit with 1D-1R architecture with a 4 × 4 crossbar array has been demonstrated, which realizes the functionality of a one-bit full adder as same as CMOS logic circuits with lower design area requirement. This chapter suggests that a simple, robust approach to realize memcomputing chips is quite compatible with large-scale CMOS manufacturing technology by using an intrinsic SiOx-based memristor
- …